Knowledge and Information Systems

, Volume 48, Issue 1, pp 29–54 | Cite as

Multi-graph-view subgraph mining for graph classification

  • Jia Wu
  • Zhibin Hong
  • Shirui Pan
  • Xingquan Zhu
  • Zhihua Cai
  • Chengqi Zhang
Regular Paper

Abstract

In this paper, we formulate a new multi-graph-view learning task, where each object to be classified contains graphs from multiple graph-views. This problem setting is essentially different from traditional single-graph-view graph classification, where graphs are collected from one single-feature view. To solve the problem, we propose a cross graph-view subgraph feature-based learning algorithm that explores an optimal set of subgraphs, across multiple graph-views, as features to represent graphs. Specifically, we derive an evaluation criterion to estimate the discriminative power and redundancy of subgraph features across all views, with a branch-and-bound algorithm being proposed to prune subgraph search space. Because graph-views may complement each other and play different roles in a learning task, we assign each view with a weight value indicating its importance to the learning task and further use an optimization process to find optimal weight values for each graph-view. The iteration between cross graph-view subgraph scoring and graph-view weight updating forms a closed loop to find optimal subgraphs to represent graphs for multi-graph-view learning. Experiments and comparisons on real-world tasks demonstrate the algorithm’s superior performance.

Keywords

Multi-graph-view Feature selection Subgraph mining Graph classification 

Notes

Acknowledgments

The work was supported by the Key Project of the Natural Science Foundation of Hubei Province, China (Grant No. 2013CFA004), the National Scholarship for Building High Level Universities, China Scholarship Council (No. 201206410056), National Natural Science Foundation of China (Grant Nos. 61403351 and 61370025), and the Chinese National “111” Project hosted by SA Centre for Big Data Research in Renmin University of China. It is also partially supported by the Australian Research Council Discovery Projects under Grant Nos. DP140100545 and DP140102206.

References

  1. 1.
    Aggarwal CC (2011) On classification of graph streams. In: Proceedings of the SIAM international conference on data mining, SDM, pp 652–663Google Scholar
  2. 2.
    Aggarwal CC, Ta N, Wang J, Feng J, Zaki M (2007) Xproj: a framework for projected structural clustering of xml documents. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, KDD, pp 46–55Google Scholar
  3. 3.
    Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th annual conference on computational learning theory, COLT, pp 92–100Google Scholar
  4. 4.
    Borgelt C, Berthold M (2002) Mining molecular fragments: finding relevant substructures of molecules. In: Proceedings of the IEEE international conference on data mining, ICDM, pp 51–58Google Scholar
  5. 5.
    Cheng H, Lo D, Zhou Y, Wang X, Yan X (2009) Identifying bug signatures using discriminative graph mining. In: Proceedings of the 18th international symposium on software testing and analysis, ISSTA, pp 141–152Google Scholar
  6. 6.
    Deshpande M, Kuramochi M, Wale N, Karypis G (2005) Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans Knowl Data Eng 17:1036–1050CrossRefGoogle Scholar
  7. 7.
    Fang Z, Zhang ZM (2013) Discriminative feature selection for multi-view cross-domain learning. In: Proceedings of the 22Nd ACM international conference on information and knowledge management, CIKM, pp 1321–1330Google Scholar
  8. 8.
    Fei H, Huan J (2008) Structure feature selection for graph classification. In: Proceedings of the 17th ACM conference on information and knowledge management, CIKM, pp 991–1000Google Scholar
  9. 9.
    Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with hilbert-schmidt norms. In: Proceedings of the 16th international conference on algorithmic learning theory, ALT, pp 63–77Google Scholar
  10. 10.
    Harchaoui Z, Bach F (2007) Image classification with segmentation graph kernels. In: Proceedings of IEEE conference on computer vision and pattern recognition, CVPR, pp 1–8Google Scholar
  11. 11.
    Hong Z, Wang C, Mei X, Prokhorov D, Tao D (2014) Tracking using multilevel quantizations. In: Proceedings of the European conference on computer vision, ECCV, pp 155–171Google Scholar
  12. 12.
    Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceedings of the IEEE international conference on data mining, ICDM, pp 549–552Google Scholar
  13. 13.
    Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: Proceedings of the 4th European conference on principles of data mining and knowledge discovery, PKDD, pp 13–23Google Scholar
  14. 14.
    Jin N, Young C, Wang W (2009) Graph classification based on pattern co-occurrence. In: Proceedings of the 18th ACM conference on information and knowledge management, CIKM, pp 573–582Google Scholar
  15. 15.
    Jin N, Young C, Wang W (2010) Gaia: graph classification using evolutionary computation. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD, pp 879–890Google Scholar
  16. 16.
    Kong X, Fan W, Yu PS (2011) Dual active feature and sample selection for graph classification. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD, pp 654–662Google Scholar
  17. 17.
    Kong X, Yu P (2010) Semi-supervised feature selection for graph classification. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD, pp 793–802Google Scholar
  18. 18.
    Kong X, Yu PS (2012) gmlc: a multi-label feature selection framework for graph classification. Knowl Inf Syst 31(2):281–305CrossRefGoogle Scholar
  19. 19.
    Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of the IEEE international conference on data mining, ICDM, pp 313–320Google Scholar
  20. 20.
    Li J, Wang JZ (2008) Real-time computerized annotation of pictures. IEEE Trans Pattern Anal Mach Intell 30:985–1002CrossRefGoogle Scholar
  21. 21.
    Luo X, ZX, Yu J, Chen X (2011) Building association link network for semantic link on web resources. IEEE Trans Autom Sci Eng 8(3):482–494Google Scholar
  22. 22.
    Mayo M, Frank E (2011) Experiments with multi-view multi-instance learning for supervised image classification. In: IVCNZ, pp 363–369Google Scholar
  23. 23.
    Nijssen S, Kok J (2004) A quickstart in frequent structure mining can make a difference. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, KDD, pp 647–652Google Scholar
  24. 24.
    Nock R, Nielsen F (2004) Statistical region merging. IEEE Trans Pattern Anal Mach Intell 26(11):1452–1458Google Scholar
  25. 25.
    Pan S, Wu J, Zhu X, Zhang C (2015) Graph ensemble boosting for imbalanced noisy graph stream classification. IEEE Trans Cybern 45(5):940–954Google Scholar
  26. 26.
    Phillips W, Riloff E (2002) Exploiting strong syntactic heuristics and co-training to learn semantic lexicons. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing, vol 10, EMNLP, pp 125–132Google Scholar
  27. 27.
    Saigo H, Krämer N, Tsuda K (2008) Partial least squares regression for graph mining. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD, pp 578–586Google Scholar
  28. 28.
    Tang J, Hu X, Gao H, Liu H (2013) Unsupervised feature selection for multi-view data in social media. In: Proceedings of the 13th SIAM international conference on data mining, SDM, pp 270–278Google Scholar
  29. 29.
    Thoma M, Cheng H, Gretton A, Han J, Kriegel H, Smola A, Song L, Yu P, Yan X, Borgwardt K (2009) Near-optimal supervised feature selection among frequent subgraphs. In: Proceedings of the SIAM international conference on data mining, SDM, pp 1075–1086Google Scholar
  30. 30.
    Wu J, Hong Z, Pan S, Zhu X, Cai Z, Zhang C (2014) Multi-graph-view learning for graph classification. In: Proceedings of the 14th IEEE international conference on data mining, ICDM, pp 590–599Google Scholar
  31. 31.
    Wu J, Hong Z, Pan S, Zhu X, Zhang C, Cai Z (2014) Multi-graph learning with positive and unlabeled bags. In: Proceedings of the SIAM international conference on data mining, SDM, pp 217–225Google Scholar
  32. 32.
    Wu J, Pan S, Zhu X, Cai Z (2015) Boosting for multi-graph classification. IEEE Trans Cybern 45(3):430–443CrossRefGoogle Scholar
  33. 33.
    Wu J, Zhu X, Zhang C, Cai Z (2013) Multi-instance multi-graph dual embedding learning. In: Proceedings of the 13th IEEE international conference on data mining, ICDM, pp 827–836Google Scholar
  34. 34.
    Wu J, Zhu X, Zhang C, Yu P (2014) Bag constrained structure pattern mining for multi-graph classification. IEEE Trans Knowl Data Eng 26(10):2382–2396CrossRefGoogle Scholar
  35. 35.
    Xia T, Tao D, Mei T, Zhang Y (2010) Multiview spectral embedding. IEEE Trans Syst Man Cybern Part B Cybern 40(6):1438–1446CrossRefGoogle Scholar
  36. 36.
    Xu C, Tao D, Xu C (2013) A survey on multi-view learning. arXiv preprint arXiv:1304.5634
  37. 37.
    Yan R, Naphade M (2005) Semi-supervised cross feature learning for semantic concept detection in videos. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, CVPR, pp 657–663Google Scholar
  38. 38.
    Yan X, Cheng H, Han J, Yu PS (2008) Mining significant graph patterns by leap search. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, SIGMOD, pp 433–444Google Scholar
  39. 39.
    Yan X, Han J (2002) gspan: graph-based substructure pattern mining. In: Proceedings of the IEEE international conference on data mining, ICDM, pp 721–724Google Scholar
  40. 40.
    Yu J, Liu D, Tao D, Seah HS (2012) On combining multiple features for cartoon character retrieval and clip synthesis. IEEE Trans Syst Man Cybern Part B Cybern 42(5):1413–1427CrossRefGoogle Scholar
  41. 41.
    Zhao X, Xiao C, Zhang W, Lin X, Tang J (2014) Improving performance of graph similarity joins using selected substructures. In: Proceedings of the 19th international conference on database systems for advanced applications, DASFAA, pp 156–172Google Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  • Jia Wu
    • 1
    • 2
  • Zhibin Hong
    • 1
  • Shirui Pan
    • 1
  • Xingquan Zhu
    • 3
  • Zhihua Cai
    • 2
  • Chengqi Zhang
    • 1
  1. 1.Quantum Computation and Intelligent Systems (QCIS) Centre, FEITUniversity of Technology SydneyUltimoAustralia
  2. 2.School of Computer ScienceChina University of GeosciencesWuhanChina
  3. 3.Department of Computer and Electrical Engineering and Computer ScienceFlorida Atlantic UniversityBoca RatonUSA

Personalised recommendations